Imitation Learning with Demonstrations and Shaping Rewards

نویسندگان

  • Kshitij Judah
  • Alan Fern
  • Prasad Tadepalli
  • Robby Goetschalckx
چکیده

Imitation Learning (IL) is a popular approach for teaching behavior policies to agents by demonstrating the desired target policy. While the approach has lead to many successes, IL often requires a large set of demonstrations to achieve robust learning, which can be expensive for the teacher. In this paper, we consider a novel approach to improve the learning efficiency of IL by providing a shaping reward function in addition to the usual demonstrations. Shaping rewards are numeric functions of states (and possibly actions) that are generally easily specified, and capture general principles of desired behavior, without necessarily completely specifying the behavior. Shaping rewards have been used extensively in reinforcement learning, but have been seldom considered for IL, though they are often easy to specify. Our main contribution is to propose an IL approach that learns from both shaping rewards and demonstrations. We demonstrate the effectiveness of the approach across several IL problems, even when the shaping reward is not fully consistent with the demonstrations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Imitation in Reinforcement Learning

The promise of imitation is to facilitate learning by allowing the learner to observe a teacher in action. Ideally this will lead to faster learning when the expert knows an optimal policy. Imitating a suboptimal teacher may slow learning, but it should not prevent the student from surpassing the teacher’s performance in the long run. Several researchers have looked at imitation in the context ...

متن کامل

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay me...

متن کامل

Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets

Imitation learning has traditionally been applied to learn a single task from demonstrations thereof. The requirement of structured and isolated demonstrations limits the scalability of imitation learning approaches as they are difficult to apply to real-world scenarios, where robots have to be able to execute a multitude of tasks. In this paper, we propose a multi-modal imitation learning fram...

متن کامل

Do children with autism re - enact

It has been suggested that autism-specific imitative deficits may be reduced or even spared in object-related activities. However, most previous research has not sufficiently distinguished object movement reenactment (learning about the ways in which object move) from imitation (learning about the topography of demonstrated actions). 20 children with autism (CWA) and 20 typically developing chi...

متن کامل

Active Imitation Learning of Hierarchical Policies

In this paper, we study the problem of imitation learning of hierarchical policies from demonstrations. The main difficulty in learning hierarchical policies by imitation is that the high level intention structure of the policy, which is often critical for understanding the demonstration, is unobserved. We formulate this problem as active learning of Probabilistic State-Dependent Grammars (PSDG...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014